ImmunoCleaner Vignette

Introduction

The main purpose if this package is two fold. Firstly, it can download, combine and tidy data sets from 10x Genomics (see README) while setting up for filtering. Secondly, the data can be visualised in a plethora of ways through the exported functions in the package.

In the following, a work flow will be showcased. A certain baseline filtering is compared to a stricter filtering by inspected the output of certain functions. It is important to mention, that not all functions are showcased. For that, and to get a better understanding of the methodology, see the Technical Report in doc/TechnicalReport/ImmunoCleaner.html.

Tidying and filtering of the data is important if further analyses are needed. It contains a lot of cluttering. E.g., the avidity of an interaction between a TCR and a pMHC is measured through the UMI-counts. There is no ground truth as to what threshold lets us trust an interaction. Therefore, we need tools to change these thresholds while also visualising the outcome until some goal is reached.

Example of a work flow

Firstly, the package is loaded along with gt to format tables:

library(ImmunoCleaner)
library(gt)

We will define the baseline filtering as the exported data set, but with 10x Genomics standards applied. This means, that a relevant binding between a TCR and a pMHC has an UMI-count greater than 10, and 5 times greater than the negative control with the highest UMI-count for that cell. This is the default already applied to the data set, so we simpler filter:

data_baseline <- data_combined_tidy %>% 
  dplyr::filter(is_binder == TRUE)

For the stricter filtering, we will only include HLA-matches which are TRUE, and increase the UMI-count threshold to 40. A TRUE match means, that the allele of the pMHC matches the haplotype of the donor.

data_strict <- data_combined_tidy %>% 
  dplyr::filter(HLA_match == "TRUE") %>% 
  evaluate_binder(UMI_count_min = 40) %>% 
  dplyr::filter(is_binder == TRUE)

After applying the stricter filtering, it would be interesting to see how much data is retained:

data_strict %>%
  filter(HLA_match == "TRUE") %>% 
  percentage_rows_kept
donor percentage_left
donor1 56.97
donor2 63.06
donor3 0.03
donor4 0.71

Approximately 40% are gone for donor1 and donor2. More than 99% are gone for donor3 and donor4. It could be interesting to investigate why the change in data points are so different across donors. We see the count of relevant binders stratified in HLA-match and donor for the baseline:

data_baseline %>% 
  count_binding_pr_allele()

For donor1 and donor2, the vast majority of interactions are TRUE matches. For donor3 and donor4, we see the opposite. When applying the filters, we remove all but the TRUE matches, hence the vast drop in retained data points.

The implications of the filtering can be observed through relevant_binders_plot(). This function outputs a scatter plot where a point represents a relevant binder. The size of a dots represents the support for the specific interaction. The colouring, called concordance, represents the number of interactions supporting the specific TCR and pMHC out of all interactions for that TCR. Firstly, we see the plot with baseline filtering:

data_baseline %>% 
  relevant_binders_plot()

As shown, there is a large amount of cluttering. Many of the data points carry a low concordance, meaning the TCR only interacted with the specific pMHC few times compared to other pMHC the same TCR interacts with. We expect a TCR to be specific for a single pMHC, whereas a pMHC can interact with different TCRs.

When using the strict filtering, the plot is as follows:

data_strict %>% 
  relevant_binders_plot()

The plot for donor1 and donor2 are less cluttered, and less ambiguity is shown. As we saw in the beginning of the section, the amount of data points left for donor3 and donor4 while applying strict filtering are very limited. Less than 30 data points are left in total for the two donors, which would make further analyses negligible.

The option to apply another set of filters, and rerunning the functions, would be a good option as to reach optimal settings. The filters could even be applied donor-wise if desirable.

Shiny integration

The above filters and visualisations have been implemented into a Shiny App allowing for user-friendly interaction instead of writing code. As the Shiny App is build using this package, there is no difference in the filters or the output of the functions. The App can be found here.